A Learning-Based Term-Weighting Approach for Information Retrieval
نویسندگان
چکیده
One of the core components in information retrieval(IR) is the document-term-weighting scheme. In this paper,we will propose a novel learning-based term-weighting approach to improve the retrieval performance of vector space model in homogeneous collections. We first introduce a simple learning system to weighting the index terms of documents. Then, we deduce a formal computational approach according to some theories of matrix computation and statistical inference. Our experiments on 8 collections will show that our approach outperforms classic tfidf weighting, about 20%∼45%.
منابع مشابه
An Integrated and Improved Approach to Terms Weighting in Text Classification
Traditional text classification methods utilize term frequency (tf) and inverse document frequency (idf) as the main method for information retrieval. Term weighting has been applied to achieve high performance in text classification. Although TFIDF is a popular method, it is not using class information. This paper provides an improved approach for supervised weighting in the TFIDF model. The t...
متن کاملAn Adaptive Context-Based Algorithm for Term Weighting: Application to Single-Word Question Answering
Term weighting systems are of crucial importance in Information Extraction and Information Retrieval applications. Common approaches to term weighting are based either on statistical or on natural language analysis. In this paper, we present a new algorithm that capitalizes from the advantages of both the strategies by adopting a machine learning approach. In the proposed method, the weights ar...
متن کاملWeighting in Information Retrieval Using Genetic Programming: A Three Stage Process
This paper presents term-weighting schemes that have been evolved using genetic programming in an adhoc Information Retrieval model. We create an entire term-weighting scheme by firstly assuming that term-weighting schemes contain a global part, a term-frequency influence part and a normalisation part. By separating the problem into three distinct phases we reduce the search space and ease the ...
متن کاملImproving automatic bug assignment using time-metadata in term-weighting
Assigning newly reported bugs to project developers is a time-consuming and tedious task for triagers using the traditional manual bug triage process. Previous efforts for creating automatic bug assignment systems use machine learning and information-retrieval techniques. These approaches commonly use tf-idf, a statistical computation technique for weighting terms based on term frequency. Howev...
متن کاملAdaptive Term Weighting through Stochastic Optimization
Term weighting strongly influences the performance of text mining and information retrieval approaches. Usually term weights are determined through statistical estimates based on static weighting schemes. Such static approaches lack the capability to generalize to different domains and different data sets. In this paper, we introduce an on-line learning method for adapting term weights in a sup...
متن کامل